NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PROVABLY EFFICIENT LIFELONG REINFORCEMENT LEARNING WITH LINEAR REPRESENTATION

Amani, Sanae; Yang, Lin; Cheng, Ching-An (May 2023, ICLR)

Full Text Available
Hindsight learning for MDPs with exogenous inputs

Sinclair, Sean_R; Frujeri, Felipe; Cheng, Ching-An; Marshall, Luke; Barbalho, Hugo; Li, Jingling; Neville, Jennifer; Menache, Ishai; Swaminathan, Adith (July 2023, Proceedings of Machine Learning Research)

Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). Our HL algorithms achieve data efficiency by leveraging a key insight: having samples of the exogenous variables, past decisions can be revisited in hindsight to infer counterfactual consequences that can accelerate policy improvements. We compare HL against classic baselines in the multi-secretary and airline revenue management problems. We also scale our algorithms to a business-critical cloud resource management problem – allocating Virtual Machines (VMs) to physical machines, and simulate their performance with real datasets from a large public cloud provider. We find that HL algorithms outperform domain-specific heuristics, as well as state-of-the-art reinforcement learning methods.
more » « less
Full Text Available
Adversarially Trained Actor Critic for Offline Reinforcement Learning

Cheng, Ching-An; Xie, Tengyang; Jiang, Nan; Agarwal, Alekh (July 2022, Proceedings of the 39th International Conference on Machine Learning)

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism. ATAC is designed as a two-player Stackelberg game: A policy actor competes against an adversarially trained value critic, who finds data-consistent scenarios where the actor is inferior to the data-collection behavior policy. We prove that, when the actor attains no regret in the two-player game, running ATAC produces a policy that provably 1) outperforms the behavior policy over a wide range of hyperparameters that control the degree of pessimism, and 2) competes with the best policy covered by data with appropriately chosen hyperparameters. Compared with existing works, notably our framework offers both theoretical guarantees for general function approximation and a deep RL implementation scalable to complex environments and large datasets. In the D4RL benchmark, ATAC consistently outperforms state-of-the-art offline RL algorithms on a range of continuous control tasks.
more » « less
Full Text Available
Bellman-consistent Pessimism for Offline Reinforcement Learning

Xie, Tengyang; Cheng, Ching-An; Jiang, Nan; Mineiro, Paul Agarwal (December 2021, Advances in neural information processing systems (selected for oral presentation))

Full Text Available
Accelerating Imitation Learning with Predictive Models

Cheng, Ching-An; Yan, Xinyan; Theodorou, Evangelos; Boots, Byron (January 2019, Proceedings of Machine Learning Research)

Full Text Available
Truncated Back-propagation for Bilevel Optimization

Shaban, Amirreza; Cheng, Ching-An; Hatch, Nathan; Boots, Byron (January 2019, Proceedings of Machine Learning Research)

Full Text Available
Predictor-Corrector Policy Optimization

Cheng, Ching-An; Yan, Xinyan; Ratliff, Nathan; Boots, Byron (January 2019, Proceedings of the International Conference on Machine Learning)

Full Text Available
Convergence of Value Aggregation for Imitation Learning

Cheng, Ching-An; Boots, Byron (January 2018, Proceedings of Machine Learning Research)

Full Text Available
Variational Inference for Gaussian Process Models with Linear Complexity

Cheng, Ching-An; Boots, Byron (October 2017, Advances in neural information processing systems)

Full Text Available
Fast Policy Learning through Imitation and Reinforcemen

Cheng, Ching-An; Yan, Xinyan; Wagener, Nolan; Boots, Byron (January 2018, Uncertainty in artificial intelligence)

Full Text Available

« Prev Next »

Search for: All records